Indexed-based density biased sampling for clustering applications

نویسندگان

Alexandros Nanopoulos

Yannis Theodoridis

Yannis Manolopoulos

چکیده

Density Biased Sampling (DBS) has been proposed to address the limitations of Uniform sampling, by producing the desired probability distribution in the sample. The ease of producing a random sample depends on the available mechanism for accessing the elements of the dataset. Existing DBS algorithms perform sampling over flat files. In this paper we develop a new method that exploits spatial indexes and the local density information they preserve, to provide good quality of sampling result and fast access to elements of the dataset. With the proposed method accurate density estimations can be produced with respect to factors like skew, noise or dimensionality. Moreover, significant improvement in sampling time is attained. The performance of the proposed method is examined analytically and experimentally. The comparative results illustrate its superiority over existing methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexed - based density biased sampling for clustering applications I

Density biased sampling (DBS) has been proposed to address the limitations of Uniform sampling, by producing the desired probability distribution in the sample. The ease of producing a random sample depends on the available mechanism for accessing the elements of the dataset. Existing DBS algorithms perform sampling over flat files. In this paper, we develop a new method that exploits spatial i...

متن کامل

Some Asymptotic Results of Kernel Density Estimator in Length-Biased Sampling

In this paper, we prove the strong uniform consistency and asymptotic normality of the kernel density estimator proposed by Jones [12] for length-biased data.The approach is based on the invariance principle for the empirical processes proved by Horváth [10]. All simulations are drawn for different cases to demonstrate both, consistency and asymptotic normality and the method is illustrated by ...

متن کامل

Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets

We investigate the use of biased sampling according to the density of the data set to speed up the operation of general data mining tasks, such as clustering and outlier detection in large multidimensional data sets. In density-biased sampling, the probability that a given point will be included in the sample depends on the local density of the data set. We propose a general technique for densi...

متن کامل

An Efficient Approximation Scheme for Data Mining Tasks

We investigate the use of biased sampling according to the density of the dataset, to speed up the operation of general data mining tasks, such as clustering and outlier detection in large multidimensional datasets. In densitybiased sampling, the probability that a given point will be included in the sample depends on the local density of the dataset. We propose a general technique for density-...

متن کامل

Weighted K-Means for Density-Biased Clustering

Clustering is a task of grouping data based on similarity. A popular k-means algorithm groups data by firstly assigning all data points to the closest clusters, then determining the cluster means. The algorithm repeats these two steps until it has converged. We propose a variation called weighted k-means to improve the clustering scalability. To speed up the clustering process, we develop the r...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Data Knowl. Eng.

دوره 57 شماره

صفحات -

تاریخ انتشار 2006

Indexed-based density biased sampling for clustering applications

نویسندگان

چکیده

منابع مشابه

Indexed - based density biased sampling for clustering applications I

Some Asymptotic Results of Kernel Density Estimator in Length-Biased Sampling

Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets

An Efficient Approximation Scheme for Data Mining Tasks

Weighted K-Means for Density-Biased Clustering

عنوان ژورنال:

اشتراک گذاری